Published October 22, 2024
By Kimberly Mann Bruch, SDSC Communications
The San Diego Supercomputer Center (SDSC), part of the School of Computing, Information and Data Sciences at UC San Diego, has been leading the Open Storage Network (OSN) program for years, and along with its collaborators – Massachusetts Green High Performance Computing Center (MGHPCC), National Center for Supercomputing Applications, Renaissance Computing Institute, Johns Hopkins University and Rice University – invites the community to apply for campus computing storage space on the system.
OSN strives to provide low-cost, quality, sustainable distributed storage cloud for the research community. Most recently, SDSC Research Data Services (RDS) Director Christine Kirkpatrick has worked with MGHPCC Executive Director John Goodhue on expanding the OSN to include more pods (i.e., storage nodes) at additional sites throughout the U.S. and welcomes new campus computing partners.
“The past decade has seen rapid growth of data sets from scientific instruments, simulations, internet postings and other sources – allowing new insights through big data analytics and more recently training of AI models,” Goodhue said. “This torrent of data has created a need for a platform that supports simple and cost-efficient storage and sharing of large volumes of data.”
In 2018, the OSN set out to address that need with a pilot that was sponsored by the U.S. National Science Foundation (NSF). Since then, the platform has evolved into a large-scale production storage system that supports allocations of up to 50 terabytes at no charge via the NSF ACCESS program, as well as paid participation for entities with larger needs.
“While the OSN started as a way to store and share data across geographically distributed sites, it has also been useful as a way to share data between different groups on individual campuses,” Kirkpatrick said. “OSN continues to expand with several new pod sites for projects by the principal investigators writing campus computing storage into their proposals.”
She explained that data on the OSN is accessed via the S3 RESTful API – a de facto standard that supports easy access to shared data across geographic and administrative boundaries for both open and protected data sets. Equally important is the variety of software utilities that support different types of access, including high-speed data transfer via utilities such as Rclone or Globus, gateways that map OSN storage to a local NFS API, application libraries that provide direct access to OSN storage for R, Python, Julia and other programming environments – as well as research data management platforms that use the network’s back-end storage.
“Use of the S3 API makes it possible to structure the OSN as a distributed network of storage pods, where each participating site houses one or more pods containing one or two petabytes (PB) of storage, connected to Internet2 at speeds ranging from 10 to 100 Gigabits per second,” Goodhue said. “While every pod conforms to a standard system design and runs that same software stack, the OSN also supports ‘virtual pods’ where the backing store is a subset of a larger system.”
Over the past few years, the OSN has grown to 17 sites housing more than 35 petabytes of storage and continues to grow. The OSN pod hardware design and software stack are operated, maintained and enhanced by a distributed engineering and operations team drawn from participating sites. This collaborative approach allows the pooling of expertise and avoids dependence on any single individual or institution for continued success. Oversight and governance is provided by a leadership team that also draws from participating sites.
“We built the Open Storage Network on the premise that a collaborative effort, rooted in the research computing community, could address an acute need for simple and cost-efficient storage and sharing of scientific data. Results so far have exceeded expectations,” said Johns Hopkins University Bloomberg Distinguished Computer Science Professor Alex Szalay, who was the founder of the OSN project.
To learn more, please see the detailed OSN webinar slides regarding becoming a campus computing storage partner.
Share